Experiment

Experiment is configured in an experiment log file (Excel file, in my case, in different tabs)

  1. Reading an experiment configuration (Experiment_name) from an experiment log file (Experiments_file). Target and Dataset columns in AllExperiments_tab contain data file name used and target column
  1. Models based on individual datasets to be created, trained and compared in the experiment (Experiment_Features_tab) is a table with first column Model name (should be unique) and next columns [1:51] features to train the model. Feature is the exact column name from the dataset or a calculation based on exact column names and eval pandas function

This configuration will be used to preprocess data and also need to be moved to S3 in csv format for easy reading in a preprocessing script if we use AWS SKLearnProcessor/job/instances

2a.Preprocessed data may already exists in an S3. Experiment configuration can provide the list of files per model. In this case (len(preprocessed_data)==0) the code skips all steps to preprocess data

  1. Model params to be used in training is a table with first column Model name (should be unique and corresponds to models in Experiment_Features_tab) and next columns are XGBoost parameters In a general case, all models can have the same parameters

4.Verification if we have the same set of models in both configurations

Data preprocessing

Preprocessing output (training and testing datasets) are saved separately for each model in a folder with the same name as a models name configured in the experiment

Model training

Experiment results

This is a separate block. If the above part (training) takes a long time and run from in the background mode, not Python notebook, the results can be run time to time to monitor the process from a notebook.

1.Testing fold scores

1.Training and validation errors from folds:

  1. Feature Importance if it was generated in CV

3.Visualization aggregated from folds best models scores

  1. Corrected t-test compares VALIDATION scores of individual folds in a choosen model to the rest of the models folds
  1. Corrected Confidence interval of the difference between model VALIDATION scores
  1. Students t-test compares VALIDATION scores of individual folds in a choosen model to the rest of the models folds
  1. Confidence interval of the difference between model Validation scores
  1. t-test compares TEST scores of individual folds in a choosen model to the rest of the models folds
  1. Confidence interval of the difference between model TEST scores

The difference between the means of model scores for the entire population present in this confidence interval. If there is no difference, then the interval contains zero (0). If zero is NOT in the range of values, the difference is statistically significant.

  1. Training and validation errors (output from the model) to estimate overfitting

Hyperparameters visualizations